Aim 1: Predict degree of improvement in ruminative and depressive symptoms: RRS & HDRS-6.
Aim 2: Determine which treatment-specific models predict across treatment arms.
| group | n | mean_age | sd_age | n_males |
|---|---|---|---|---|
| e | 26 | 40.00000 | 13.74336 | 12 |
| k | 49 | 38.85714 | 10.79159 | 25 |
| s | 52 | 32.48077 | 11.22408 | 23 |
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| group | 2 | 2.583 | 1.292 | 11.13 | 1.876e-05 |
**scale** 3 7.736 2.579 22.23 1.694e-13
group:scale 6 1.757 0.2928 2.524 0.02047
Table: 2-way ANOVA Table: Treatment-by-Scale Changes
Quitting from lines 51-85 (NARSAD_Aim_1_Summary.Rmd) Error in pander(ph\(`group:scale`[ph\)group:scale[, “p adj”] < 0.05, ], : object ‘ph’ not found Calls:
Goal:Train and test several types of classifiers (Random Forests, Radial SVM, Gradient Boosted Trees) to narrow down which are best
Result: Gradient boosted trees consistently outperform both random forests and radial SVMs
Result: A correlation threshold between 0.1 and 0.3 looks best. Now a more refined grid search using boosted trees and a strict cutoff can be done.
Gradient boosted trees have several important parameters to tune:
Because there are so many model-specific parameters and hyperparameters to tune in a brute force grid search, I’ll tune some of the larger knobs first…
Result: Differences are more pronounced for models excluding baseline severity. I’ll restrict the nrounds range to: {50, 100, 500}
Result: Given the above, I restricted this range to 2:6
Result: No clear winner here; I restricted the range to 1:2.
Result: Biggest effect is for models including baseline symptoms but general trend is strongly downard for all so I set eta to 0.025.
Result: No clear effect. Set to 0.3.
Results: Performance varies a lot across scales, treatments, and parameterization. This suggests that no one parameterization will be optimal for all outcomes and treatments. I’ll outline the models that perform the best within this restricted grid search.
| cutoff | baseline | y | nrounds | eta | maxdepth | gamma | colsamp | childweight | subsamp | variable | PvAc Score | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| hdrs6.perf_e | 0.3 | FALSE | hdrs6 | 500 | 0.025 | 6 | 0.3 | 0.4 | 1 | 1 | perf_e | 0.2727997 |
| rrsb.perf_e | 0.1 | FALSE | rrsb | 100 | 0.025 | 2 | 0.3 | 0.4 | 1 | 1 | perf_e | 0.5374308 |
| rrsr.perf_e | 0.1 | FALSE | rrsr | 500 | 0.025 | 2 | 0.3 | 0.4 | 2 | 1 | perf_e | 0.2085911 |
| tcqr.perf_e | 0.1 | FALSE | tcqr | 500 | 0.025 | 6 | 0.3 | 0.4 | 2 | 0.5 | perf_e | 0.3676297 |
| hdrs6.perf_k | 0.3 | FALSE | hdrs6 | 500 | 0.025 | 2 | 0.3 | 0.4 | 2 | 0.5 | perf_k | 0.0636506 |
| rrsb.perf_k | 0.1 | FALSE | rrsb | 100 | 0.025 | 4 | 0.3 | 0.4 | 2 | 1 | perf_k | 0.1596519 |
| rrsr.perf_k | 0.3 | FALSE | rrsr | 100 | 0.025 | 2 | 0.3 | 0.4 | 2 | 1 | perf_k | 0.3104615 |
| tcqr.perf_k | 0.1 | FALSE | tcqr | 100 | 0.025 | 2 | 0.3 | 0.4 | 2 | 1 | perf_k | -0.0265675 |
| hdrs6.perf_s | 0.1 | FALSE | hdrs6 | 100 | 0.025 | 6 | 0.3 | 0.4 | 2 | 1 | perf_s | -0.0618437 |
| rrsb.perf_s | 0.3 | FALSE | rrsb | 500 | 0.025 | 2 | 0.3 | 0.4 | 1 | 1 | perf_s | 0.7181260 |
| rrsr.perf_s | 0.3 | FALSE | rrsr | 100 | 0.025 | 4 | 0.3 | 0.4 | 2 | 1 | perf_s | 0.4281529 |
| tcqr.perf_s | 0.1 | FALSE | tcqr | 500 | 0.025 | 2 | 0.3 | 0.4 | 2 | 0.5 | perf_s | 0.1539772 |
## TableGrob (2 x 3) "arrange": 6 grobs
## z cells name grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (1-1,2-2) arrange gtable[layout]
## 3 3 (1-1,3-3) arrange gtable[layout]
## 4 4 (2-2,1-1) arrange gtable[layout]
## 5 5 (2-2,2-2) arrange gtable[layout]
## 6 6 (2-2,3-3) arrange gtable[layout]
## TableGrob (2 x 3) "arrange": 6 grobs
## z cells name grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (1-1,2-2) arrange gtable[layout]
## 3 3 (1-1,3-3) arrange gtable[layout]
## 4 4 (2-2,1-1) arrange gtable[layout]
## 5 5 (2-2,2-2) arrange gtable[layout]
## 6 6 (2-2,3-3) arrange gtable[layout]
## TableGrob (2 x 3) "arrange": 6 grobs
## z cells name grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (1-1,2-2) arrange gtable[layout]
## 3 3 (1-1,3-3) arrange gtable[layout]
## 4 4 (2-2,1-1) arrange gtable[layout]
## 5 5 (2-2,2-2) arrange gtable[layout]
## 6 6 (2-2,3-3) arrange gtable[layout]
## TableGrob (2 x 3) "arrange": 6 grobs
## z cells name grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (1-1,2-2) arrange gtable[layout]
## 3 3 (1-1,3-3) arrange gtable[layout]
## 4 4 (2-2,1-1) arrange gtable[layout]
## 5 5 (2-2,2-2) arrange gtable[layout]
## 6 6 (2-2,3-3) arrange gtable[layout]
## TableGrob (2 x 3) "arrange": 6 grobs
## z cells name grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (1-1,2-2) arrange gtable[layout]
## 3 3 (1-1,3-3) arrange gtable[layout]
## 4 4 (2-2,1-1) arrange gtable[layout]
## 5 5 (2-2,2-2) arrange gtable[layout]
## 6 6 (2-2,3-3) arrange gtable[layout]
## TableGrob (2 x 3) "arrange": 6 grobs
## z cells name grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (1-1,2-2) arrange gtable[layout]
## 3 3 (1-1,3-3) arrange gtable[layout]
## 4 4 (2-2,1-1) arrange gtable[layout]
## 5 5 (2-2,2-2) arrange gtable[layout]
## 6 6 (2-2,3-3) arrange gtable[layout]
## TableGrob (2 x 3) "arrange": 6 grobs
## z cells name grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (1-1,2-2) arrange gtable[layout]
## 3 3 (1-1,3-3) arrange gtable[layout]
## 4 4 (2-2,1-1) arrange gtable[layout]
## 5 5 (2-2,2-2) arrange gtable[layout]
## 6 6 (2-2,3-3) arrange gtable[layout]
## TableGrob (2 x 3) "arrange": 6 grobs
## z cells name grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (1-1,2-2) arrange gtable[layout]
## 3 3 (1-1,3-3) arrange gtable[layout]
## 4 4 (2-2,1-1) arrange gtable[layout]
## 5 5 (2-2,2-2) arrange gtable[layout]
## 6 6 (2-2,3-3) arrange gtable[layout]
## TableGrob (2 x 3) "arrange": 6 grobs
## z cells name grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (1-1,2-2) arrange gtable[layout]
## 3 3 (1-1,3-3) arrange gtable[layout]
## 4 4 (2-2,1-1) arrange gtable[layout]
## 5 5 (2-2,2-2) arrange gtable[layout]
## 6 6 (2-2,3-3) arrange gtable[layout]
Result: Gradient boosted trees seem to generalize slightly better, on average.
Results: Inclusion of baseline symptom severity greatly improves model generalizability, of course. However, the distribution of several model performances across treatment arms is consistently above zero despite excluding baseline severity. Also, the models optimized for performance within treatment arms (green diamonds) are not always the best at generalizing.